REPLICATION INFORMATION

These replication files allow one to replicate all tables, figures, and results reported in Bagozzi, Berliner, and Almquist's
	"When Does Open Government Shut? Predicting Government Responses to Citizen Information Requests," 
	as well as those reported in its Online Appendix.

This includes:

1. An "Article and Supplemental Appendix" subfolder, which contains PDFs of the final version of the article and online appendix mentioned above.


2. A "Main Article" subfolder, which contains code and data corresponding to the main paper's Table 1, Table 2, Table 3, and Table 4, as well as the 
	main paper's Figure 1, Figure 2, and Figure 3. More specifically:
	a) sLDA_model_fit_pt_I.R and sLDA_model_fit_pt_II.R produce the output (X.validation.csv and X.validation.pr.csv) needed for plotting Figure 1
	b) sLDA_mode_fit_pt_III.R produces the output (X.X.1.validation.csv and X.X.1.validation.pr.csv) needed for plotting Figure 2
	c) Figures_1_and_2.R reads-in the files produced under a and b to generate Figures 1 and 2.
	d) SLDA_full_in_out_sample_model.R estimates the main sLDA model whose output is used in Tables 1-4 & Figure 3 (via "ModelOutput" and "Figure_3_input"),
		while also producing in-sample and out-of-sample predictions ("bad.in.sample.medium.csv" & "bad.out.sample.medium.csv") for Tables 3-4.
	e) Figure_3.R reads in "Figure_3_input" and produces Figure 3
	f) Table_1.R reads in "ModelOutput" and produces the Spanish language version of Table 1 (which is human translated to English for Table 1 itself)
	g) Table_2.R reads in "ModelOutput" and produces Table 2
	h) Table_3_and_Table_4.R reads in "bad.in.sample.medium.csv" & "bad.out.sample.medium.csv" to produce Tables 3 and 4.


3. A "Supplemental Appendix" subfolder, which contains subfolders that contain materials for reproducing the Online Appendix's Tables A.1-A.21 and Figures A.1-A.3.
	More specifically:
	a) The "Figure A1 and A2" subfolder contains the code used in generating the auxiliary topic & hyperparameter selection routine, and the data for 
		plotting the resultant auxiliary topic & hyperparameter output via "Figure A1.R" and "Figure A2.R" (also contained within this subfolder)
	b) The "Table A1 to Table A5" subfolder contains the model output from "ModelOutput" (produced by SLDA_full_in_out_sample_model.R) and scripts that
		use this output to reproduce the Spanish-language (stemmed and unstemmed) topwords reported in Tables A.1, A.2, A.4, and A.5 (Table A.3 is 
		human translated to English from Table A.4)
	c) The "Table A6 and Figure A3" subfolder contains the code and inputs needed for the over-time assessments of topic variation. Note that the input
		needed for the R scripts in this folder ("badresponseForPlot.csv") is produced by Table_2.R and is also included in this subfolder
	d) The "Table A7 to Table A12" subfolder contains the model output from "ModelOutput" (produced by SLDA_full_in_out_sample_model.R) and scripts that
		use this output to reproduce the Spanish-language (stemmed and unstemmed) topwords reported in Tables A.8, A.9, A.11, and A.12 (Tables A.7
		and A.10 are human translated to English from Tables A.8 and A.11, respectively), which correspond to the 'middle' request topics
	e) The "Table A13 and A14" subfolder re-estimates the sLDA model when excluding agency names as text features via ("SLDA_Model_Excluding_Agencies.R") and
		uses the predictive output from this model ("bad.in.sample.just.text" & "bad.out.sample.just.text") to generate the classification metrics
		reported in Tables A.13 and A.14 via Table_A13_and_Table_A14.R
	f) The "Table A15" subfolder estimates a series of comparison classifiers (sLDA, Lasso, Ridge, & Logit), storing each model's out-of-sample predictions
		for comparison via "Table_A15.R"
	g) The "Table A16 to A21" estimates the STM discussed in the Appendix, and extracts the relevant topwords used in Tables A.16-A.21 (Tables A.16 and
		A.19 are again human translated to English)


Note: As several of the R scripts discussed above note, we do not provide the raw text associated with Mexico's ATI requests due to privacy and file-size concerns.
	This includes:
	a) All attachdataXXXX.csv files (the original INFOMEX ATI request response data, with attachment texts scraped and merged in),
	b) the complete.text.data.txt file (a file of all words contained in the request texts from all attachdataXXXX.csv files) and 
	c) the fulldataXXXX.csv files (the preprocessed documents from attachdataXXXX.csv)

	The Supplemental Appendix contains details on how the raw text in attachdataXXXX.csv was processed to create the preprocessed text contained in the
	fulldataXXXX.csv files.
	
	One can recover the original ATI request texts and original attachment files contained in attachdataXXXX.csv via the "Solicitudes y respuestas por periodo" 
	section of this page: https://www.infomex.org.mx/gobiernofederal/homeOpenData.action

	At the time of the analysis, with the exception of 2004, each access to information (ATI) request in our sample was given a unique id number by
	the Mexican government, known as a 'FOLIO ID'. In some cases, our replication data include these FOLIO IDs. For the requests arising from 2004, 
	we use the original row numbers from the Mexican government's 2004 ATI request-response CSV file as the FOLIO IDs for that year, as obtained
	from the infomex page mentioned above

All R packages used correspond to the latest versions available as of September 2017.

Please contact Benjamin E. Bagozzi (bagozzib@udel.edu)for any questions about these data.